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Abstract — In this paper, we present a universal scheme for 
transforming an arbitrary algorithm for biased 2 -face coins to 
generate random bits from the general source of an m-sided die, 
hence enabling the application of existing algorithms to general 
sources. In addition, we study approaches of efficiently generating 
a prescribed number of random bits from an arbitrary biased 
coin. This contrasts with most existing works, which typically 
assume that the number of coin tosses is fixed, and they generate 
a variable number of random bits. 

Index Terms — Random Number Generation, Biased Coins, 
Loaded Dice. 



I. Introduction 

IN this paper, we study the problem of random number 
generation from i.i.d. sources, which is the most funda- 
mental and important source model. Many real sources can 
be well approximated by this model, and the algorithms 
developed based on this model can be further generalized 
in generating random bits from more sophisticated models, 
like Markov chains fl3l . or more generally, approximately 
stationary ergodic processes |14|. 

The problem of random number generation dates back to 
von Neumann [12] in 1951 who considered the problem of 
simulating an unbiased coin by using a biased coin with 
unknown probability. He observed that when one focuses on 
a pair of coin tosses, the events HT and TH have the same 
probability (H is for 'head' and T is for 'tail'); hence, HT 
produces the output symbol 1 and TH produces the output 
symbol 0. The other two possible events, namely, HH and TT, 
are ignored, namely, they do not produce any output symbols. 
More efficient algorithms for generating random bits from a 
biased coin were proposed by Hoeffding and Simons [4], Elias 
(2), Stout and Warren IflOl and Peres (8). Elias (2) was the first 
to devise an optimal procedure in terms of the information 
efficiency, namely, the expected number of unbiased random 
bits generated per coin toss is asymptotically equal to the 
entropy of the biased coin. In addition, Knuth and Yao [6| 
presented a simple procedure for generating sequences with 
arbitrary probability distributions from an unbiased coin (the 
probability of H and T is |). Han and Hoshi |3 j generalized 
this approach and considered the case where the given coin 
has an arbitrary known bias. 
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In this paper, we consider the problem of generating random 
bits from a loaded die as a natural generalization of generating 
random bits from a biased coin. There is some related work: 
In (T), Dijkstra considered the opposite question and showed 
how to use a biased coin to simulate a fair die. In Q, Juels 
et al. studied the problem of simulating random bits from 
loaded dice, and their algorithm can be treated as the national 
generalization of Elias's algorithm. However, for a number 
of known beautiful algorithms, like Peres's algorithm JS1, we 
still do not know how to generalize them for larger alphabets 
(loaded dice). 

In addition, we notice that most existing works for biased 
coins take a fixed number of coin tosses as the input and they 
generate a variable number of random bits. In some occasions, 
the opposite question seems more reasonable and useful: given 
a biased coin, how to generate a prescribed number of random 
bits with as a few as possible coin tosses? Hence, we want to 
create a function / that maps the sequences in a dictionary V, 
whose lengthes may be different, to binary sequences of the 
same length. This dictionary T> is complete and prefix-free. 
That means for any infinite sequence, it has exactly one prefix 
in the dictionary. To generate random bits, we read symbols 
from the source until the current input sequence matches one 
in the dictionary. 

For completeness, in this paper, we first present some of 
the existing algorithms that generate random bits from an 
arbitrary biased coin in Section HU including the von Neumann 
Scheme, Elias algorithm and Peres algorithm. Then in Section 
ITiTl we present a universal scheme for transforming an arbitrary 
algorithm for 2-faced coins to generate random bits from the 
general source of an m-sided die, hence enabling the appli- 
cation of existing algorithms to general sources. In Section 
IIVI we study approaches of efficiently generating a required 
number of random bits from an arbitrary biased coin and 
achieving the information-theoretic upper bound on efficiency. 
Finally, we provide the concluding remarks in Section [V] 

II. Existing Algorithms for Biased Coins 

A. Von Neumann Scheme 

In 1951, von Neumann [12| considered the problem of 
random number generation from biased coins and described 
a simple procedure for generating an independent unbiased 
binary sequence z\z^... from an input sequence X — x\x<2,~-. 
His original procedure is described as follows: For an input 
sequence, we divide all the bits into pairs x\Xi, x-^x^, ... and 
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apply the following mapping to each pair 

HT -)■ 1, TH -4 0, HH 0, TT -> 0, 

where cf> denotes the empty sequence. By concatenating the 
outputs of all the pairs, we can get a binary sequence, which 
is independent and unbiased. The von Neumann scheme is 
computationally (very) fast, however, its information efficiency 
is far from being optimal. Here, the information efficiency is 
defined by the expected number of random bits generated per 
input symbol. Let p\ , p 2 with p\ + p 2 = 1 be the probabilities 
of getting H and T, then the probability for a pair of input 
bits to generate one output bit (not a cj)) is 2pip 2 , hence the 
information efficiency is 2p * P2 = p\pi, which is j at p\ = 
P2 = h an d l ess elsewhere. 

B. Elias Algorithm 

In 1972, Elias [2] proposed an optimal (in terms of infor- 
mation efficiency) algorithm as a generalization of the von 
Neumann scheme. 

Elias's method is based on the following idea: The possible 
2™ binary input sequences of length n can be partitioned into 
classes such that all the sequences in the same class have 
the same number of H's and T's. Note that for every class, 
the members of the class have the same probability to be 
generated. For example, let n = 4, we can divide the possible 
2" = 16 input sequences into 5 classes: 

S Q = {HHHH}, 

51 = {HHHT, HHTH, HTHH, THHH}, 

5 2 = {HHTT, HTHT, HTTH, THHT, THTH, TTHH} , 

5 3 = {HTTT, THTT, TTHT, TTTH}, 

5 4 = {TTTT}. 

Now, our goal is to assign a string of bits (the output) to each 
possible input sequence, such that any two possible output 
sequences Y and Y' with the same length (say k), have the 
same probability to be generated, which is |£- for some < 
Cfc < 1. The idea is that for any given class we partition the 
members of the class to sets of sizes that are a power of 2, for 
a set with 2* members (for some i) we assign binary strings 
of length i. Note that when the class size is odd we have to 
exclude one member of this class. We now demonstrate the 
idea by continuing the example above. 

In the example above, we cannot assign any bits to the 
sequence in So, so if the input sequence is HHHH, the output 
sequence should be <fi (denoting the empty sequence). There 
are 4 sequences in S\ and we assign the binary strings as 
follows: 

HHHT 00, HHTH 01, 

HTHH ^10, THHH 11. 

Similarly, for S2, there are 6 sequences that can be divided 
into a set of 4 and a set of 2: 

HHTT 00, HTHT -> 01, 

HTTH -> 10, THHT -> 11, 



THTH -> 0, TTHH -> 1. 

In general, for a class with W members that were not 
assigned yet, assign 2 J possible output binary sequences of 
length j to V distinct unassigned members, where 2 3 < W < 
2 3+1 . Repeat the procedure above for the rest of the members 
that were not assigned. When a class has an odd number of 
members, there will be one and only one member assigned to 

4>. 

Given a binary input sequence X of length n, using the 
method above, the output sequence can be written as a function 
of X, denoted by ^ B (X), called the Elias function. In 0, 
Ryabko and Matchikina showed that the Elias function of an 
input sequence of length n (that is generated by a biased coin 
with two faces) is computable in 0(nlog 3 nloglog(n)) time. 



C. Peres Algorithm 

In 1992, Peres [8] demonstrated that iterating the origi- 
nal von Neumann scheme on the discarded information can 
asymptotically achieve optimal information efficiency. Let us 
define the function related to the von Neumann scheme as 
*i : {H, T}* ->■ {0,1}*. Then the iterated procedures 
with v > 2 are defined inductively. Given an input sequence 
XiX 2 ...x 2m , let i\ < i 2 < ■■■ < %k denote all the integers 
i < m for which X2i = X2i-i> then ^> v is defined as 

^ V (X-L,X2, —,X2m) 
= *i(xi,X 2 , ...,X 2m ) **„_l(xi ©X 2 ,...,X 2 m-l ©X 2m ) 
**„_i(x 2il , ...,X 2i J. 

Note that on the righthand side of the equation above, the 
first term corresponds to the random bits generated with the 
von Neumann scheme, the second and third terms relate to 
the symmetric information discarded by the von Neumann 
scheme. For example, when the input sequence is X = 
HHTHTT, the output sequence based on the von Neumann 
scheme is 

#1 (HHTHTT) = 0. 

But based on the Peres scheme, we have the output sequence 

^t, (HHTHTT) = *i(HHTHTT) * *„_ 1 (THT) * *„_i(HT), 

which is 001, longer than that generated by the von Neumann 
scheme. 

Finally, we can define fy v for sequences of odd length by 

\&„(xi,X2, ...,x 2m+ i) = * t ,(xi,x 2 , ...,x 2m ). 

Surprisingly, this simple iterative procedure achieves the 
optimal information efficiency asymptotically. The computa- 
tional complexity and memory requirements of this scheme are 
substantially smaller than those of the Elias scheme. However, 
the generalization of this scheme to the case of an m-sided 
die with rn > 2 is still unknown. 
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D. Properties 

Let us denote * : {H,T}" -t {0,1}* as a scheme that 
generates independent unbiased sequences from any biased 
coins (with unknown probabilities). Such ^ can be the von 
Neumann scheme, the Elias scheme, the Peres scheme, or any 
other scheme. Let X be a sequence of biased coin tosses of 
length n, then a property of ^> is that for any Y G {0,1}* and 
Y' e {0, 1}* with \Y\ = \Y'\, we have 

P[V(X) = Y}= P[V(X) = Y'], 

i.e., two output sequences of equal length have equal proba- 
bility. 

This observation leads to the following property for It 
says that given the numbers of H's and T's, the number of 
sequences yielding a binary sequence Y equals the number of 
sequences yielding Y' when Y and Y' have the same length. It 
further implies that given the condition of knowing the number 
of H's and T's in the input sequence, the output sequence of \& 
is still independent and unbiased. This property is due to the 
linear independence of probability functions of the sequences 
with different numbers of H's and T's. 

Lemma 1. [13] Let be the subset of {H, T} n consisting 

of all sequences with k\ appearances of H and k 2 appearances 
of T such that ki+k 2 = n. Let By denote the set {X\^{X) = 
Y}. Then for any Y G {0, 1}* and Y' G {0, 1}* with \Y\ = 
\Y'\, we have 

\ S k 1 ,k 2 [~\ B Y\ = \Skt,k 2 P| -By/ 1 . 

III. Generalization for Loaded Dice 

In this section, we propose a universal scheme for general- 
izing all the existing algorithms for biased coins such that 
they can deal with loaded dice with more than two sides. 
There is some related work: In [1|, Dijkstra considered the 
opposite question and showed how to use a biased coin to 
simulate a fair die. In (5), Juels et al. studied the problem of 
simulating random bits from loaded dice, and their algorithm 
can be treated as the generalization of Elias's algorithm. 
However, for a number of known beautiful algorithms, like 
Peres 's algorithm, we still do not know how to generalize 
them for larger alphabets (loaded dice). We propose a universal 
scheme that is able to generalize all the existing algorithms, 
including Elias's algorithm and Peres's algorithm. Compared 
to the other generalizations, this scheme is universal and easier 
to implement, and it preserves the optimality of the original 
algorithm on information efficiency. The brief idea of this 
scheme is that given a loaded die, we can convert it into 
multiple binary sources and apply existing algorithms to these 
binary sources separately. This idea seems natural, but not 
obvious. 

A. An Example 

Let us start from a simple example: Assume we want to 
generate random bits from a sequence X = 012112210, which 
is produced by a 3-sided die. Now, we write each symbol (die 



roll) into a binary representation of length two (H for 1 and 
T for 0), so 

-> TT, 1 -> TH, 2 -> HT. 

Hence, X can be represented as 

TT,TH,HT,TH,TH,HT,HT,TH,TT. 

Only collecting the first bits of all the symbols yields an 
independent binary sequence 

X = TTHTTHHTT. 

Collecting the second bits following T, we get another inde- 
pendent binary sequence 

X T = THHHHT. 

Note that although both Xj, and X T are independent sequences 
individually, X^ and Xj are correlated with each other, since 
the length of Xj is determined by the content of X$. 

Let be any function that generates random bits from a 
fixed number of coin tosses, such as Elias's algorithm and 
Peres's algorithm. We see that both ^(X^) and ^f(Xj) are 
sequences of random bits. But we do not know whether ^(X^) 
and &(Xt) are independent of each other since X^ and Xj 
are correlated. One of our main contributions is to show that 
concatenating them together, i.e., 

+ *(X T ) 

still yields a sequence of random bits. 

B. A Universal Scheme 

Generally, given a sequence of symbols generated from an 
771-sided die, written as 

X = X\X 2 ---X n G {0, 1, 777 — l} n 

with the number of states (sides) m > 2, we want to convert 
it into a group of binary sequences. To do this, we create a 
binary tree, called a binarization tree, in which each node is 
labeled with a binary sequence of H and T. See Fig. Q] as an 
instance of binarization tree for the above example. Given the 
binary representations of Xi for all 1 < i < n, the path of each 
node in the tree indicates a prefix, and the binary sequence 
labeled at this node consists of all the bits (H or T) following 
the prefix in the binary representations of x\, £2, •••) x n (if it 
exists). 

Given the number of sides m of a loaded die, the depth of 
the binarization tree is 6 — [log 2 m] — 1. At the beginning, the 
binarization tree is a complete binary tree of depth 6 in which 
each node is labeled with an empty string, then we process 
all the input symbols x\, X2, x n one by one. For the ith 
symbol, namely Xi, its binary representation is of length 6+1. 
We add its first bit to the root node. If this bit is T, we add its 
second bit to the left child, otherwise we add its second bit 
to the right child ... repeating this process until all the 6 + 1 
bits of Xi are added along a path in the tree. Finally, we can 
get the binarization tree of X by processing all the symbols 
in X, i.e., x\,x 2 , ...,i„. 
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THHHHT * 

Fig. 1 . An instance of binarization tree. 

Lemma 2. Given the binarization tree of a sequence X G 
{0, 1, ...,m — 1}", we can reconstruct X uniquely. 

Proof: The construction of X from its binarization tree 
can be described as follows: At first, we read the first bit (H 
or T) from the root (once we read a bit, we remove it from the 
current sequence). If it is T, we read the first bit of its left child; 
if it is H, we read the first bit of its right child ... finally we 
reach a leaf, whose path indicates the binary representation 
of X\. Repeating this procedure, we can continue to obtain 

5 ^3 1 ■ ■ ■ i %n ■ ^ 

Let Tf, denote the set consisting of all the binary sequences 
of length at most 6, i.e., 

T 6 = {0, T, H, TT, TH, HT, HH, HHH...HH}. 

Given X £ {0,1, ...,m — 1}™, let X 1 denote the binary 
sequence labeled on a node corresponding to a prefix 7 in 
the binarization tree, then we get a group of binary sequences 

X<j>, -Xti -Xh, Xtt, Xjh, Xm, Xhh, ■■■ 

For any function that generates random bits from a fixed 
number of coin tosses, we can generate random bits from X 
by calculating 

^(X 4 ) + *(Xr) + $(X S ) + *(X T t) + *(^th) + 

where A + B is the concatenation of A and B. We call this 
method as the generalized scheme of "J. 

We show that the generalized scheme works for any binary 
algorithm ^ such that it can generate random bits from an 
arbitrary m-sided die. 

Theorem 3. Let ^ be any function that generates random 
bits from a fixed number of coin tosses. Given a sequence 
X G {0, 1, m— 1}™ with m > 2 generated from an m-sided 
die, the generalized scheme of ^ generates an independent and 
unbiased sequence. 

The proof of this theorem will be given in the next subsec- 
tion. 

C. Proof of Theorem \3\ 

Lemma 4. Let {X^} with 7 G T& be the binary sequences 
labeled on the binarization tree of X G {0, 1, m — 1}™ as 
defined above. Assume X' is a permutation of X 7 for all 7 G 



T5, then there exists exactly one sequence X' G {0, 1, m — 
1}" such that it yields a binarization tree that labels {X'^} 
with 7 G Tfc. 

Proof: Based on {X^} with 7 G Tf,, we can construct the 
corresponding binarization tree and then create the sequence 
X 1 in the following way (if it exists). At first, we read the first 
bit (H or T) from the root (once we read a bit, we remove it 
from the current sequence). If it is T, we read the first bit of 
its left child; if it is H, we read the first bit of its right child 
... finally we reach a leaf, whose path indicates the binary 
representation of x[. Repeating this procedure, we continue to 
obtain x' 2) x' 3 , x' n . Hence, we are able to create the sequence 
X 1 = x'yx'^.-.x'^-yx'n if it exists. 

It can be proved that the sequence X' can be successfully 
constructed if and only the following condition is satisfied: 
For any 7 G T b _i, 

wt(Xj) = |Xyx|, w a (X 7 ) = |X 7 h|, 

where wj(X) counts the number of T's in X and wh(X) 
counts the number of H's in X. 

Obviously, the binary sequences {X^} with 7 e Tj, satisfy 
the above condition. Permuting them into {X^} with 7 G T& 
does not violate this condition. Hence, we can always construct 
a sequence X 1 G {0, 1, ...,m — 1}™, which yields {X^} with 
7£T, 

This completes the proof. ■ 

Now, we divide all the possible input sequences in 
{0,1,..., m — 1}" into classes. Two sequences X, X' G 
{0,1,..., m — 1}" are in the same class if and only if the 
binary sequences obtained from X and X 1 are permutations 
with each other, i.e., X^ is a permutation of X 7 for all 7 G 
Here, we use G to denote the set consisting of all such classes. 

Lemma 5. All the sequences in a class G G G have the same 
probability of being generated. 

Proof: Based on the probability distribution of each die 
roll {po,pi, ...,p m -i}, we can get a group of conditional 
probabilities, denoted as 

<7t|(/>, Qn\<p, <Zt|t> 9h|t, 9t|h> 9h|H: <?t|tt, <7h|tt 5 

where q a | 7 is the conditional probability of generating a die 
roll Xi such that in its binary representation the bit following 
a prefix 7 is a. 

Note that q |7 + Qi\y = 1 f° r a U 7 £ TV For example, if 
{po,Pi,P2} = {0.2,0.3,0.5}, then 

<?o|0 = 0.5,g | = 0.4,901! = 1. 

It can be proved that the probability of generating a se- 
quence X G {0, 1, m — 1}™ equals 

n ^ ] ^\ 

where wj(X) counts the number of T's in X and wn(X) 
counts the number of H's in X. This probability keeps 
unchanged when we permute X 1 to X^ for all 7 G T^. 

This implies that all the elements in G have the same 
probability of being generated. ■ 
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Lemma 6. Let ^ be any function that generates random bits 
from a fixed number of coin tosses. Given Z^,Z' £ {0,1}* 
for all 7 G T;,, we define 

s = {x\v 1 er b ,y(x 1 ) = z 7 }, 
s' = {x\v 1 er b ,y(x 7 ) = z' 1 }. 

If \Z~f \ — \Z'^ \ for all 7 G Tb, i.e., Z 7 and Z' have the same 
length, then for all G G G, 

i G fViH G nn 

i.e., G P| S and G P| S' have the same size. 

Proof: We prove that for any 9 G T b , if Z 1 = Z^ for all 
7 ^ 6> and = |Z£|, then 

|Gf|5| = |C?f|S"|. 

If this statement is true, we can obtain the conclusion in the 
lemma by replacing Z 1 with Z' one by one for all 7 G T^. 

In the class G, assume \Xg \ = ng. Let us define Gg as the 
subset of {0, consisting of all the permutations of Xg. 
We also define 

Sg = {Xg\y(Xg) = Zg}, 
S'g = {Xg\^(Xg) = Z' g }. 

According to Lemma Q] if "J can generate random bits from 
an arbitrary biased coin, then 

GePl'S'el = IGePjS'gl. 

This implies that all the elements in Gg f] Sg and those in 
Gg p| S' e are one-to-one mapping. 

Based on this result, we are ready to show that the elements 
in G p| S and those in G f] S' are one-to-one mapping: For any 
sequence X in G f] S, we get a series of binary sequences 
{Xy} with 7 G T;,. Given Z' g with \Z' g \ — \Zg\, we can 
find a (one-to-one) mapping of Xg in Gg p| S' g , denoted by 
X' g . Here, X' g is a permutation of Xg. According to Lemma 
H] there exists exactly one sequence X' G {0, 1, m — 1}" 
such that it yields {X^,, Xj, Xh, X' g , ...}. Right now, we 
see that for any sequence X in G p| S, we can always find its 
one-to-one mapping X 1 in G f] <S", which implies that 

\Gf)S\ = \Gf)S'\. 
This completes the proof. ■ 

Based on the lemma above, we get Theorem [3] 

Theorem [3j Let ^ be any function that generates random 
bits from a fixed number of coin tosses. Given a sequence 
X G {0, 1, ...,m— 1}" with m > 2 generated from an m-sided 
die, the generalized scheme o/^ generates an independent and 
unbiased sequence. 

Proof: In order to prove that the binary sequence gener- 
ated is independent and unbiased, we show that for any two 
sequences Yi, I2 G {0, l} fc , they have the same probability to 
be generated. Hence, each binary sequence of length k can be 
generated with probability || for some < < 1. 



First, we let / : {0, 1, ...,m-l} n -> {0, 1}* be the function 
of the generalized scheme of 'J, then we write 

P[f(X) =Y 1 ] = J2 p \f(X) =Yi,X G G}. 

GGG 

According to Lemma [5] all the elements in G have the 
same probability of being generated. Hence, we denote this 
probability as pa, and the formula above can written as 

P[f{X) =Y 1 ] = Y, Pg\{X G G, f(X) = Y X }\. 
GeG 

Let Z 1 G {0, 1}* be the sequence of bits generated from the 
node corresponding to 7 for all 7 G Tf,, then Y\ = X) 7 eT 6 %t 
We get that P[f(X) = Y{\ equals 

]T p G \{XeG,V 1 er b ^{x 1 ) = z 1 }\ 

GeG{Z 7 : 7 £T(,} 

where /e tE t 6 ^=Yi = 1 if and only if £ 7 gt 6 Z 1 = Y\, 
otherwise it is zero. 

Similarly, P[f(X) = Y 2 ] equals 

£ Yl PG|{^eG,V 7 eT k ,$(I 7 )^;}| 

GeG{Z;: 7 £T i ,} 

'xly z' =Yii 

If \Z' \ = |Z 7 | for all 7 G Tb, then based on Lemma [6] we 
can get 

|{IeG,V 7 eT k ,*(I 7 ) = Z 1 ]\ 

= |{IeG,V 7 eT t! $(I 7 ) = z;}|. 

Substituting it into the expressions of P[f(X) = Y%] and 
P[f(X) = Y 2 ] shows 

P[f(X)=Y 1 ]=P[f(X)=Y 2 ]. 

So we can conclude that for any binary sequences of the 
same length, they have the same probability of being gener- 
ated. Furthermore, we can conclude that the bits generated are 
independent and unbiased. 

This completes the proof. ■ 

D. Optimality 

In this subsection, we show that the universal scheme 
keeps the optimality of original algorithms, i.e., if the binary 
algorithm is asymptotically optimal, like Elias's algorithm or 
Peres's algorithm, its generalized version is also asymptoti- 
cally optimal. Here, we say an algorithm is asymptotically 
optimal if and only if the number of random bits generated 
per input symbol is asymptotically equal to the entropy of an 
input symbol. 

Theorem 7. Given an m-sided die with probability dis- 
tribution p = (pO)Pl) •■■iPm-l)) let n t> e the number of 
symbols (dice rolls) used in the generalized scheme of ^ 
and let k be the number of random bits generated. If is 
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asymptotically optimal, then the generalized scheme of ^ is 
also asymptotically optimal, that means 



n— >oo n 



where 



m— 1 ^ 

#(p) = H (PO,Pl, -,Pm-l) = ^2 P* l0 S2 — 



i=0 



z's f/ie entropy of the m-sided die. 



Proof: We prove this by induction. Using the same 
notations as above, we have the depth of the binarization tree 
b = [log 2 to] — 1. If b = 0, i.e., to < 2, the algorithm is 
exactly vp. Hence, it is asymptotically optimal on efficiency. 
Now, assume that the conclusion holds for any integer 6—1, 
we show that it also holds for the integer b. 

Since the length- (b + 1) binary representations of 
{0,1,..., 2 b — 1} start with 0, the probability for a symbol 
starting with is 

2 6 -l 
z=0 

In this case, the conditional probability distribution of these 
symbols is 

{ Po Pi P2"-l 1 

qa qa qa 



Similarly, let 



qi = p ^ 

i=2 b 



then the conditional probability distribution of the symbols 
starting with 1 is 

f P2» P2» + l Pm-l j 

qi ' qi ' "' q\ 

When n is large enough, the number of symbols starting 
with approaches nq and the number of symbols starting 
with 1 approaches nqi. According to our assumption for 6 — 1, 
the total number of random bits generated approaches 

nH(q ,qi) +nq Q H( — ) 

qa qa qa 



+n qi H{ 



P2 b P2 b + 1 Pm-l 
i i ••*) 

qi qi qi 



which equals 

nqo log 2 h ngi log 2 \- nq ^ — log 2 — 



2°-l 



9o 

m— 1 

Pi , 9i 
+ngi > — log 2 — 

^ ft P* 

m — 1 ^ 

= n^ Pl log 2 - 
i=o Pl 
= nfl'(po,pi,...,p m _i). 

This completes the proof. 



9i 



i=0 



9o Pi 



IV. Efficient Generation of k Random Bits 

A. Motivation 

Most existing works on random bits generation from biased 
coins aim at maximizing the expected number of random 
bits generated from a fixed number of coin tosses. Falling 
into this category, Peres's scheme and Elias's scheme are 
asymptotically optimal for generating random bits. However, 
in these methods, the number of random bits generated is a 
random variable. In some occasions, we prefer to generate 
a prescribed number of random bits, hence it motivates us 
an opposite question: fixing the number of random bits to 
generate, i.e., k bits, how can we minimize the expected 
number of coin tosses? This question is equally important 
as the original one, since in many applications a prescribed 
number of random bits are required while the source is usually 
a stream of coin tosses instead of a sequence of fixed length. 
But the existing study on this question is very limited. 

To generate k random bits, we are always able to make use 
of the existing schemes with fixed input length and variable 
output length like Peres's scheme or Elias's scheme. For 
example, we can keep reading n tosses (H or T) for several 
times and concatenate their outputs until the total number of 
random bits generated is slightly larger than k. However, if 
n is small, this approach is less information efficient. If n is 
large, this approach may generate too many extra random bits, 
which can be treated as a waste. In this section, we propose 
an algorithm to generate exactly k random bits efficiently. 
It is motivated by the Elias's scheme. It can be proved that 
this algorithm is asymptotically optimal, namely, the expected 
number of coin tosses required per random bit generated is 
asymptotically equal to one over the entropy of the biased 
coin. 

B. An Iterative Scheme 

It is not easy to generate k random bits directly from 
a biased coin with very high information efficiency. Our 
approach of achieving this goal is to generate random bits 
iteratively - we first produce to < k random bits, where m 
is a variable number that is equal to or close to k with very 
high probability. In next step, instead of trying to generate k 
random bits, we try to generate fc — to random bits ... we repeat 
this procedure until generating total k random bits. 

How can we generate to random bits from a biased coin 
such that to is variable number that is equal to or very close 
to fc? Our idea is to construct a group of disjoint prefix sets, 
denoted by S\, S2, S w , such that (1) all the sequences in 
a prefix set Si with 1 < i < w have the same probability 
of being generated, and (2) S = Si \J S2 U ••■ U S w form a 
stopping set, namely, we can always get a sequence in S (or 
with probability almost 1) when keeping reading tosses from 
a biased coin. For example, we can let 

Si = {HH,HT}, 
^2 = {THH, TTT}, 
S 3 = {THT, TTH}. 

Then S = Si [j S 2 U ^3 forms a stopping set, which is 
complete and prefix-free. 
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In the scheme, we let all the sequences in Si for all 1 < i < 
w have the same probability, i.e., Si consists of sequences with 
the same number of H's and T's. We select criteria carefully 
such that \Si\ is slightly larger than 2 k . Similarly as Elias's 
original scheme, we assign output binary sequences to all the 
members in Si for all 1 < i < w. Let W be the number of 
members that were not assigned yet in a prefix set, then 2 J 
possible output binary sequences of length j are assigned to 
V distinct unassigned members, where j = k if W > 2 k and 
2 J < W < 2 ]+1 if W < 2 k . We repeat the procedure above 
for the rest of the members that were not assigned. 

Theorem 8. The above method generates m random bits for 
some m with < m < k. 

Proof: It is easy to see that the above method never 
generates a binary sequence longer than k. We only need to 
prove that for any binary sequences Y,Y' G {0,1}"', they 
have the same probability of being generated. 

Let / denote the function corresponding to the above 
method. Then 

w 

P[f(x) = Y] = Y / P[Xe Si]P[f(X) = Y\x e Si}. 

i=l 

Given X G S* 4 , we have P[f(X) = Y\X G S t ] = P[f(X) = 
Y'\X G Si], which supports our claim that any two binary 
sequences of the same length have the same probability of 
being generated. ■ 

The next question is how to construct such prefix sets 
Si, S 2 , S w . Let us first consider the construction of their 
union, i.e., the stopping set S. Given a biased coin, we design 
an algorithm that reads coin tosses and stops the reading until it 
meets the first input sequence that satisfies some criterion. For 
instance, let fci be the number of H's and k 2 be the number 
of T's in the current input sequence, one possible choice is 
to read coin tosses until we get the first sequence such that 
( kl ) > 2 k . Such an input sequence is a member in the 
stopping set S. However, this criterion is not the best one that 
we can have, since it will introduce too many iterations to 
generate k random bits. To reduce the number of iterations, 
we hope that the size of each prefix set, saying Si, is slightly 
larger than 2 k . As a result, we use the following stopping set: 

S = {the first sequence s.t. > — — - — r^rl- 

\ ki J mm(ki,k 2 ) 

Later, we will show that the selection of such a stopping set 
can make the number of iterations very small. 

Now we divide all the sequences in the stopping set S into 
different classes, i.e., the prefix sets Si, S2, S w , such that 
each prefix set consists of the sequences with the same number 
of H's and T's. Assume S kl ,h 2 is a nonempty prefix set that 
consists of sequences with ki H's and k 2 T's, then 



Sk u k 2 = n s, 



above, we have 

ki + k 2 -l\ 2 k (ki + k 2 -l) 
k[ ) < min(fc;,fc 2 ) 

where k[ is the number of H's in x without considering the last 
symbol and k' 2 is the number of H's in x without considering 
the last symbol. So if the last symbol of x is H, then k[ = 
ki — 1, k' 2 = k 2 ; if the last symbol of x is T, then k[ = 
ki,k' 2 = k 2 — 1. According to the expression of S'fe ll fe 2 , we see 
that the sequences in a prefix set are not prefixes of sequences 
in another prefix set. Furthermore, we can prove that the size 
of each prefix set is at least 2 k . 

Lemma 9. If S klM ^ 4>, then \S klM \ > 2 k . 

Proof: Without loss of generality, we assume that ki < 
k 2 , hence, { kl ^ 2 ) > tlhj^A. \ t a i so implies ki > 1. To 
prove |5fc l! fc 2 | > 2 k , we show that S klyk2 includes all the 
sequences x G G kl ,k 2 ending with H. If x G G kl ,k 2 ending 
with H does not belong to S kl . k2 , then 

ki + k 2 - 1\ > 2 k (k 1 +k 2 - 1) 
k[ J k[ 



From which, we can get 

'ki + k 2 - 1 
h 



> 



2 k (h 



1) 



> 2 k 



It further implies that all the sequences x G G kl:k2 ending 
with T are also not members in S kltk . 2 . So S kl ,h 2 is empty. It 
is a contradiction. 

The number of sequences x G G kl:k . 2 ending with H is 

'ki + k 2 - 1\ _ (ki + k 2 \ kj_ 

ki-1 J ~ V h )ki+k 2 

So the size of Sk lt h 2 is at least 2 k if S kltk2 ^ (j). This 
completes the proof. ■ 

Based on the construction of prefix sets, we can get an 
algorithm $fc for generating m random bits with < m < k, 
described as follows. 

Algorithm <&k 

Input: A stream of biased coin tosses. 
Output: m bits with < m < k. 

(1) Reading coin tosses until there are ki H's and k 2 T's 
for some ki and k 2 such that 

'fci + k 2 \ > 2 k (k 1 +k 2 ) 
ki J ~ min(fci, k 2 ) ' 

(2) Let X denote the current input sequence of coin tosses. 
If the last coin toss is H, we let k[ = ki — 1, k 2 = k 2 ; 
otherwise, we let k[ = ki, k 2 = k 2 — 1. We remove this 
coin toss from X if 



where G kl ,h 2 is the set consisting of all the sequences with 
ki H's and k 2 T's. According to the stopping set constructed 



ki + k 2 



l \ 2 k (k 1 +k 2 - 1) 



min(fc' 1 , k' 2 ) 



s 



(3) Let denote the Elias's function^ for generating 
random bits from a fixed number of coin tosses. A fast 
computation of ^e was provided by Ryabko and Matchik- 
ina in [9|. The output of the algorithm is ^E'e(X) or 
the last k bits of ^e(X) if is longer than k. 

According to Lemma [9] we can easily get the following 
conclusion. 

Corollary 10. The algorithm generates m random bits for 
some m with < m < k, and m — k with probability at least 
1/2. 

Proof: The sequence generated by $fc is independent 
and unbiased. This conclusion is immediate from Lemma [9] 
Assume that the input sequence x G Si for some i with 
1 < i < w, then the probability of m = k is 



\Si\ 
2 k - 



which is at least 1/2 based on the fact that \Si\ > 2 k . Since 
this conclusion is true for all Sj with 1 < i < w, we can claim 
that m — k with probability at least 1/2. ■ 

Since the algorithm generates m random bits for some m 
with < m < k from an arbitrary biased coin, we are able to 
generate k bits iteratively: After generating m random bits, we 
apply the algorithm <&k-m for generating k—m bits. Repeating 
this procedure, the total number of random bits generated will 
converge to k very quickly. We call this scheme as an iterative 
scheme for generating k random bits. 

To generate k random bits, we do not want to iterate $k too 
many times. Fortunately, in the following theorem, we show 
that in our scheme the expected number of iterations is upper 
bounded by a constant 2. 

Theorem 11. The expected number of iterations in the itera- 
tive scheme for generating k random bits is at most 2. 

Proof: According to Corollary [TUl $fe generates m = k 
random bits with probability at least 1/2. Hence, the scheme 
stops at each iteration with probability more than 1/2. Fol- 
lowing this fact, the result in the theorem is immediate. ■ 



C. Optimality 

In this subsection, we study the information efficiency of the 
iterative scheme and show that this scheme is asymptotically 
optimal. 

Lemma 12. Given a biased coin with probability p being H, 
let n be the number of coin tosses used by the algorithm 
then 

,. E \ n ] 1 
lim — - — < 

fc->oc k Hyp) 

Proof: We consider the probability of having an input 
sequence of length at least n, denote as P n . In this case, we 
can write n = k\ + &2> where k\ is the number of H's and 

'Here, an arbitrary algorithm for generating random bits from a fixed 
number of coin tosses works. 



k 2 is the number of T's. According to the construction of the 
stopping set, 

n-1 \ < 2k n-1 
v min(fci, k 2 ) — I J min(fci, fe) — 1 

Or we can write it as 

v min(fci, k 2 ) -2j 
Hence, we get an upper bound for min(fci, k 2 ), which is 



t n = max{i £ {0,l,...,n}|K_2 j < 2 fe }. 



(1) 



Note that if (l_ 2 ) > 2 fc , then t n is a nondecreasing function 
of n. 

According to the symmetry of our criteria, we can get 

Pn<j2(p l (l-P) n - 1 + (1-P) l p n - 
i=0 

For convenience, we write 

t n 

Qn = Y J (p i {l-pT- i + {l-p) i p n - 
i=0 

then P n < Q n and Q n is also a nondecreasing function of n. 

Now, we are ready to calculate the expected number of coin 
tosses required, which equals 



E[n] = ^(P„-P„ +1 )n = ^P„ 



(2) 



n=l 



< 



™=7ffe)( 1 + e ) ™= 2 7ffe)( 1 + £ ) 



n=l 



where e > is a small constant. In the rest, we study the 
upper bounds for all the three terms when n is large enough. 
For the first term, we have 



r(l+e) 



< 



H(p) 



(3) 



Now let us consider the second term 



r(i+e) 



< 



H{p) 



(l + e)Q * a 



Using the Stirling bounds on factorials yields 

lim -log 2 ( n ) = H(p), 
n-s-oo n \pnj 

where H is the binary entropy function. Hence, following ((TJ, 
we can get 

t k 
lim H( — ) = lim — . 

n— >oo n n—>oo n 

When n = 7j^y(l + e)> we can write 
r t n , _ H(p) 



lim H{- 



1 + c' 
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which implies that 



lim — = p — e% 

n—^oo 12 



for some t\ > 0. So there exists an N± such that for n > N\, 
^<p-ei/2. 

By the weak law for the binomial distribution, given any 
e 2 > and 5 > 0, there is an N 2 such that for n > N 2 , 
with probability at least 1 — 5 there are i H's among the n 
coin tosses such that |^ — p\ < e 2 . Letting e 2 = ei/2 and 

n = -Wp)( l + e ) 8 ives 

Qn <6, 

for any S > when n > max(Ni, N 2 ). 

So for any 5 > 0, when k is large enough, we have 



E ^ 



< 



r(l+e) 



H(p) 



(l + e)6. 



(4) 



To calculate the third term, we notice that Q n decays very 
quickly as n increase when n > 2-^^y(l + e). In this case, 

Qn+l 
Qn 

£l=y(p'(i -p)" +1 - j + (i -pYp n+1 - l ){ n V) 



< 



Efeo(p*(i -p) n ~ l + {i-pYp^O) 

Eto(p i (i-p) n+i - 1 + (i -p)> n+i '0("r) 

E-^a - p)"-* + (i - pYp^Q) 
^ *» (p i (i-p) n+1 - < + (i-p)V l+1 - i ) n f) 

~ i=0 (p*(l -p)n-» + (1 _p)ipn-i)(") 

, , > t„ n + 1 
< (1 — p) max ■ 



=i n + 1 — t„ 



< 



(1 — p)n 

Tl—tn 



When n > 2-g£jy(l + e), we have 

Bxn = li m * < gM_ 

n->oc n n->oo n 2(1 + e) 



This implies that when n is large enough, H(^) < Let 
us define a constant a such that a < | and i?(ct) 
Then for all n > 2 g fr \ (1 + e), when k is large enough, 

Qn+l < 1 -P ^ 



ff(p) 



1 - a 



Therefore, given any 5 > 0, when is large enough, the 
value of the third term 



E « 

n=2 7 ±- ) (l+e) 



i=0 



< Q 



< 



Hfe)( 1 + e > 

1 — a 
o. 

p — a 



1 Q 



(5) 



Substituting (01, ©, and <(5j into (|2]i yields that for any 
e > and <5 > 0, if fc is large enough, we have 

%]<-|-(l + e)(l + 5) + ^ ) 
/i (p) p — Q' 

with a < p. 

Then it is easy to get that 

r E[n] 1 
lim — — < . 

fc->-oo K H (p) 

This completes the proof. ■ 

Theorem 13. Given a biased coin with probability p being 
H, let n be the number of coin tosses required to generate k 
random bits in the iterative scheme, then 



hm« 

k— >oo k 



1 



H(py 



Proof: First, we prove that lim^oo — jp > jjrjx- Let 
X G {0,1}* be the input sequence, then 

Bm *mi. . i. 

Shannon's theory tells us that it is impossible to extract 
more than H(X) random bits from X, i.e., H(X) > k. So 

lim — > — — . 

To get the conclusion in the theorem, we only need to show 
that 

E\n\ . 1 



lim 

k— >oc k 



< 



H{p) 



To distinguish the n in this theorem and the one in the 
previous theorem, we use nn.\ denote the number of coin 
tosses required to generate k random bits in the iterative 
scheme and let n* fe j denote the number of coin tosses required 
by Let p m be the probability for $fc generating m random 
bits with < m < k. Then we have that 



E[n (k) ] = E[nf k) ] + ]T Pm E[i 



'(k — rn)\ 



(6) 



According to the algorithm, p k > \ and E[ri(k—m)) < 
E[ri(k)]- Substituting them into the equation above gives 

E[n (k) ] < E[nf k) ] + ^E[n {k) ], 

i.e., E[n {k) ] < 2E[n* k) ]. 

Now, we divide the second term in (|6]l into two parts such 
that 

k—ek k 

E[n {k) ] < E[nf k) }+^2 p m E[n {k _ m) }+ p m E[n (k _ m) ], 

m=0 m=k—ek 

for a constant e > 0. In which, 

k — ek k — ek 

P m E[n (k _ m) ] < p m )2E[nf k) ], 



m=0 



m=0 
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^2 PmE[ri(k-m)] < 2E[r 



'(efc)J 



m—k — ek 



Hence 



k — ek 



E[n (k) ] < E[nf k) ] + ( £ Pm )2E[nf k) ] + 2E[ 



l (ek)\ 



(7) 



m=0 



Given k, all the possible input sequences are divided into w 
prefix sets Si, 5a, S w , where w can be an infinite number. 
Given an input sequence X G Si for 1 < i < w, we are 
considering the probability for & k generating a sequence of 
length m. 

In our algorithm, \Si\ > 2 k . Assume 

\Si\ = a k 2 k + a fe _ 1 2 fe - 1 + ... + a Q 2°, 

where a k > 1 and < ao, ai, a k -i < 1. Given the 
condition X G Si, we have 

fc_efc sr^k-ek i k-ek+l ofc-efc+l 

E _ 2^i=Q a i z < __f < f 

V fc rv-2* -2 fc + 2 fc - £fc + 1 - 2 fe ' 

m=0 Z^i=0 UjZ 

So given any <5 > 0, when k is large enough, we have 

k—ek 



(8) 



m=0 



Although we reach this conclusion for X G Si, this conclusion 
holds for any Si with < i < w. Hence, we are able to remove 
this constrain that X G Si. 

According to the previous lemma, for any S > 0, when k 
is large enough, we have 



E\nf h ,} I 

L (efc)J 1 



ek ~ H(p) 



E[nf k) \ 1 



< 



k ~ H{p) 
Substituting ©, ©, and ([To]) into © gives us 

S[n (fc) ] < + 5)(1 + 25) + 2keij±- ) + 6). 

From which, we obtain 



(9) 



(10) 



l im EM. = i im E[n ^ < _L 
fe^oc fc fe->oo k ~ Hip) 



This completes the proof. ■ 

The theorem above shows that the iterative scheme is 
asymptotically optimal, i.e., the expected number of coin 
tosses for generating k random bits approaches the information 
theoretic bound by below when k becomes large. 



V. Conclusion 

In this paper, we have presented a universal scheme that 
transforms an arbitrary algorithm for 2-faced coins to generate 
random bits from general m-sided dice, hence enabling the 
application of existing algorithms to general sources. Although 
a similar question has been studied before, as in [5|, then- 
solution can only be applied to a specified algorithm, i.e., 
Elias's algorithm. 

The second contribution of this paper is an efficient al- 
gorithm for generating a prescribed number of random bits 
from an arbitrary biased coin. In many applications, this is 
a natural way of considering the problem of random bits 
generation from biased coins, but it is not well studied in 
the literature. This problem is similar to the one studied in 
universal variable-to-fixed length codes, which are used to 
parse an infinite sequence into variable-length phases. Each 
phase is then encoded into a fixed number of bits. In Q, 
Lawrence devised a variable-to-fixed length code for the 
class of binary memoryless sources (biased coins), which 
is based on Pascal's triangle (so is our algorithm). Tjalkens 
and Willems [!l l!| modified Lawrence's algorithm as a more 
natural and simple implementation, and they showed that the 
rate of the resulting code converges asymptotically optimally 
fast to the source entropy. These universal variable-to-fixed 
length codes are probably capable to generate random bits 
asymptotically in some (week) sense, namely, the random bits 
generated in this way are not perfect, and they cannot satisfy 
the typical requirement based on statistical distance (widely 
used in computer science). 
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